# Low WER
Whosper Large V2
Apache-2.0
Whosper-large-v2 is a cutting-edge speech recognition model specifically designed for Wolof, the primary language of Senegal. Built upon OpenAI's Whisper-large-v2, it significantly improves Word Error Rate (WER) and Character Error Rate (CER).
Speech Recognition Supports Multiple Languages
W
CAYTU
449
6
Whisper Hindi2Hinglish Swift
Apache-2.0
A Hindi-Hinglish mixed speech recognition model optimized based on the Whisper architecture, specifically designed for Indian accents and noisy environments
Speech Recognition
Transformers Supports Multiple Languages

W
Oriserve
496
6
Viwhisper Medium
MIT
Whisper-medium model optimized for Vietnamese speech recognition tasks, fine-tuned on 1308 hours of Vietnamese data
Speech Recognition
Transformers Other

V
NhutP
139
4
Parakeet Ctc 0.6b
Parakeet CTC 0.6B is an automatic speech recognition model jointly developed by NVIDIA NeMo and Suno.ai, based on the FastConformer architecture with approximately 600 million parameters, supporting English speech transcription.
Speech Recognition English
P
nvidia
6,528
13
Parakeet Rnnt 0.6b
Parakeet RNNT 0.6B is an automatic speech recognition model jointly developed by NVIDIA NeMo and Suno.ai, based on the FastConformer architecture with approximately 600 million parameters, specifically designed for transcribing English speech into text.
Speech Recognition English
P
nvidia
92.27k
8
Parakeet Ctc 1.1b
Parakeet CTC 1.1B is an automatic speech recognition model jointly developed by NVIDIA NeMo and Suno.ai, based on the FastConformer architecture with approximately 1.1 billion parameters, supporting English speech transcription.
Speech Recognition English
P
nvidia
14.78k
29
Whisper Large V3 French
MIT
A French automatic speech recognition model fine-tuned based on OpenAI Whisper-large-v3, supporting case sensitivity, punctuation, and number prediction
Speech Recognition
Transformers French

W
bofenghuang
771
28
Asr Whisper Medium Commonvoice Ar
Apache-2.0
A Whisper medium speech recognition model fine-tuned on the CommonVoice Arabic dataset, developed by the SpeechBrain team
Speech Recognition Arabic
A
speechbrain
17
2
Stt En Fastconformer Transducer Xlarge
The NVIDIA FastConformer-Transducer is a high-performance model for English automatic speech recognition (ASR), utilizing an optimized FastConformer architecture and Transducer decoder with approximately 618 million parameters.
Speech Recognition English
S
nvidia
106
24
Stt En Fastconformer Ctc Xlarge
NVIDIA FastConformer-CTC XLarge is an Automatic Speech Recognition (ASR) model with approximately 600 million parameters, designed specifically for English speech transcription and trained using the FastConformer architecture and CTC loss.
Speech Recognition English
S
nvidia
216
2
Whisper Small Cv11 French
Apache-2.0
A French automatic speech recognition model fine-tuned based on openai/whisper-small, trained on the Common Voice 11.0 French dataset, supporting case sensitivity and punctuation prediction.
Speech Recognition
Transformers French

W
bofenghuang
266
4
Stt Rw Conformer Transducer Large
This is a large Conformer-Transducer model for Kinyarwanda speech recognition, which can transcribe speech into lowercase Latin letters, supporting spaces and apostrophes.
Speech Recognition Other
S
nvidia
116
1
Stt Es Conformer Transducer Large
This is a large Conformer-Transducer model for Spanish automatic speech recognition, with approximately 120 million parameters, trained on 1340 hours of Spanish speech data.
Speech Recognition Spanish
S
nvidia
708
4
Stt De Conformer Transducer Large
This is a large Conformer-Transducer model for German automatic speech recognition, with approximately 120 million parameters, supporting the transcription of German speech into text.
Speech Recognition German
S
nvidia
66
6
Stt De Conformer Ctc Large
This is a large-scale Conformer-CTC model for German automatic speech recognition, trained and optimized by NVIDIA on thousands of hours of German speech data.
Speech Recognition German
S
nvidia
132
4
Wav2vec2 Large Xlsr 53 Chinese Zn Cn Aishell1
Apache-2.0
A Chinese speech recognition model fine-tuned on the AISHELL-1 dataset based on facebook/wav2vec2-large-xlsr-53, supporting Chinese speech recognition tasks.
Speech Recognition
Transformers Chinese

W
qinyue
22
6
Wav2vec2 Large Xlsr 53 German Cv9
Apache-2.0
This is an automatic speech recognition (ASR) model fine-tuned on the German Common Voice 9.0 dataset, based on Facebook's wav2vec2-large-xlsr-53 model.
Speech Recognition
Transformers German

W
oliverguhr
98
1
Wav2vec2 Base Vietnamese 160h
Vietnamese speech recognition model based on Wav2vec2, fine-tuned on 160 hours of Vietnamese speech data
Speech Recognition
Transformers Other

W
khanhld
356
10
Wav2vec2 Base Da Ft Nst
Apache-2.0
Danish speech recognition model fine-tuned on the NST dataset, supporting 16kHz sampled audio input
Speech Recognition
Transformers Other

W
Alvenir
15
3
Wav2vec2 Large Xlsr Turkish
Apache-2.0
This is an automatic speech recognition model fine-tuned on the Turkish Common Voice dataset based on the facebook/wav2vec2-large-xlsr-53 model, achieving a test WER of 21.13%.
Speech Recognition Other
W
cahya
61
2
Bp500 Base100k Voxpopuli
Apache-2.0
Speech recognition model optimized for Brazilian Portuguese, trained with 453 hours of audio from 7 public datasets
Speech Recognition
Transformers Other

B
lgris
23
1
Wav2vec2 Large Xlsr Sundanese
Apache-2.0
A Sundanese speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained on high-quality TTS data from OpenSLR
Speech Recognition Other
W
cahya
339
0
Asr Wav2vec2 Commonvoice Fr
Apache-2.0
wav2vec 2.0 speech recognition model trained on the CommonVoice French dataset, using CTC/Attention architecture without requiring a language model
Speech Recognition French
A
speechbrain
250
10
Bp400 Xlsr
Apache-2.0
A Wav2vec 2.0 speech recognition model fine-tuned on Brazilian Portuguese datasets, supporting automatic speech recognition tasks for Brazilian Portuguese.
Speech Recognition
Transformers Other

B
lgris
55
3
Wav2vec2 Large Xlsr Eo
Apache-2.0
A speech recognition model fine-tuned for Esperanto using the Common Voice dataset, based on the facebook/wav2vec2-large-xlsr-53 model.
Speech Recognition Other
W
gchhablani
23
1
Wav2vec2 Large Xlsr Open Brazilian Portuguese V2
Apache-2.0
This is a Wav2vec2 model optimized for Brazilian Portuguese, trained on multiple open datasets for automatic speech recognition tasks.
Speech Recognition
Transformers Other

W
lgris
1,825
18
Wav2vec2 Large Xlsr Open Brazilian Portuguese
Apache-2.0
This is a Wav2vec 2.0 model fine-tuned for Brazilian Portuguese, trained using multiple open Brazilian Portuguese datasets including Common Voice, MLS, CETUC, etc.
Speech Recognition
Transformers Other

W
lgris
395
9
Wav2vec2 Base Cynthia Tedlium 2500 V2
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base-960h on the TED-LIUM dataset, achieving a word error rate of 20.33% on the evaluation set.
Speech Recognition
Transformers

W
huyue012
25
0
Bp500 Xlsr
Apache-2.0
This is a Wav2vec 2.0 model fine-tuned for Brazilian Portuguese, trained on multiple Brazilian Portuguese datasets, achieving a WER of 13.6 on the Common Voice test set.
Speech Recognition
Transformers Other

B
lgris
21
1
Wav2vec2 Live Japanese
Apache-2.0
A Japanese speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting hiragana output
Speech Recognition
Transformers Japanese

W
ttop324
20
4
Wav2vec2 Large Xlsr 53 Es
Apache-2.0
A speech recognition model fine-tuned on the Spanish Common Voice dataset based on Facebook's wav2vec2-large-xlsr-53 model, with a test WER of 10.50%.
Speech Recognition
Transformers Spanish

W
pcuenq
147
0
Wav2vec2 Large Xlsr 53 Esperanto
Apache-2.0
This is an Esperanto speech recognition model fine-tuned from Facebook's wav2vec2-large-xlsr-53 model, trained using the Common Voice dataset.
Speech Recognition Other
W
cpierse
8,681
6
Xls R Nl V1 Cv8 Lm
This is an automatic speech recognition model based on the XLS-R architecture, specifically optimized for Dutch and Flemish, incorporating a 5-gram language model to improve recognition accuracy.
Speech Recognition
Transformers Other

X
FremyCompany
14
3
Galician Xlsr
Apache-2.0
This model is an automatic speech recognition model fine-tuned on the Galician dataset based on facebook/wav2vec2-xls-r-300m, achieving a WER of 11.31% on the Common Voice 8.0 test set.
Speech Recognition
Transformers Other

G
Akashpb13
110
1
Featured Recommended AI Models